Counting Duplicate Words and Characters in Java
Given the sentence “Simple problem but tricky problem”, count duplicate words and Characters. Let’s look at some of the ways we could solve this using Java.
Solution 1
Using Java 8
public static Map duplicateCharactersCountingJ8(String words) {
return words
.chars()
.filter(ch -> !Character.isWhitespace(ch))
.mapToObj(ch -> (char) ch)
.collect(Collectors.groupingBy(Character::toLowerCase, Collectors.counting()));
}
This solution uses Java 8 streams.
- Transform the given sentence into
IntStream
which is an integer representation of the characters from given sentence by calling the.chars()
method. - Filter out any whitespace as it’s pointless counting them.
- Convert the
IntStream
into a stream of characters by calling themapToObj()
- Then group the stream of characters using
groupingBy()
and count the occurrence of each character usingCollectors.counting()
which is equivalent toCollectors.reducing(0L, e -> 1L, Long::sum)
.
We could apply the same principle to word counting but with a slight different approach but still using Collectors
public static Map duplicateWordsCounting(String inputString) {
return Pattern.compile("\\W+")
.splitAsStream(inputString.trim())
.collect(Collectors.groupingBy(String::toLowerCase, Collectors.summingInt(s -> 1)));
}
Solution 2
We could also take a more traditional for loop approach in the second solution.
public static Map duplicateCharactersCounting(String words) {
final Map map = new HashMap<>();
for (char c : words.replaceAll("\\s", "").toLowerCase().toCharArray()) {
map.compute(c, (k, v) -> (v == null) ? 1 : ++v);
}
return map;
}
Here, we employ the key value feature of the Map
to store the characters as keys and the occurrences as values while we iterate over the given sentence. The .compute()
method of the Map
checks to see if a Character was added to the map. If added, then increment the occurrence by +1 otherwise set it to 1.
Similarly, we could adopt the same approach to count duplicate words in a sentence.
public static Map wordCount(String words) {
final Map map = new HashMap<>();
String[] wordArray = words.trim().toLowerCase().split(" ");
for (String word : wordArray) {
Integer count = map.get(word);
if (count == null) {
map.put(word, 1);
} else {
map.put(word, count + 1);
}
}
return map;
}
when we run the methods using the given sentence “Simple problem but tricky problem”, we will get this output.
public static void main(String[] args) {
String sentence = " Simple problem but tricky Problem ";
Map<String, Integer> wordCount = wordCount(sentence);
System.out.println("wordCount : " + wordCount);
Map<String, Integer> duplicateWordsCounting = duplicateWordsCounting(sentence);
System.out.println("duplicateWordsCounting : " + duplicateWordsCounting);
Map<Character, Integer> duplicateCharactersCounting = duplicateCharactersCounting(sentence);
System.out.println("duplicateCharactersCounting : " + duplicateCharactersCounting);
Map<Character, Long> duplicateCharactersCountingJ8 = duplicateCharactersCountingJ8(sentence);
System.out.println("duplicateCharactersCountingJ8 : " + duplicateCharactersCountingJ8);
}
Output
wordCount : {but=1, problem=2, simple=1, tricky=1}
duplicateWordsCounting : {but=1, problem=2, simple=1, tricky=1}
duplicateCharactersCounting : {b=3, c=1, e=3, i=2, k=1, l=3, m=3, o=2, p=3, r=3, s=1, t=2, u=1, y=1}
duplicateCharactersCountingJ8 : {b=3, c=1, e=3, i=2, k=1, l=3, m=3, o=2, p=3, r=3, s=1, t=2, u=1, y=1}
Please, feel free to leave a comment if you have a more scalable solution.